German Compounds in Factored Statistical Machine Translation

نویسنده

  • Sara Stymne
چکیده

An empirical method for splitting German compounds is explored by varying it in a number of ways to investigate the consequences for factored statistical machine translation between English and German in both directions. Compound splitting is incorporated into translation in a preprocessing step, performed on training data and on German translation input. For translation into German, compounds are merged based on part-of-speech in a postprocessing step. Compound parts are marked, to separate them from ordinary words. Translation quality is improved in both translation directions and the number of untranslated words in the English output is reduced. Different versions of the splitting algorithm performs best in the two different translation directions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Getting to Know Moses: Initial Experiments on German-English Factored Translation

We present results and experiences from our experiments with phrase-based statistical machine translation using Moses. The paper is based on the idea of using an offthe-shelf parser to supply linguistic information to a factored translation model and compare the results of German–English translation to the shared task baseline system based on word form. We report partial results for this model ...

متن کامل

German Compounds and Statistical Machine Translation. Can they get along?

This paper reports different experiments created to study the impact of using linguistics to preprocess German compounds prior to translation in Statistical Machine Translation (SMT). Compounds are a known challenge both in Machine Translation (MT) and Translation in general as well as in other Natural Language Processing (NLP) applications. In the case of SMT, German compounds are split into t...

متن کامل

Statistical Machine Translation of German Compound Words

German compound words pose special problems to statistical machine translation systems: the occurence of each of the components in the training data is not sufficient for successful translation. Even if the compound itself has been seen during training, the system may not be capable of translating it properly into two or more words. If German is the target language, the system might generate on...

متن کامل

Morphological Processing of Compounds for Statistical Machine Translation

Machine Translation denotes the translation of a text written in one language into another language performed by a computer program. In times of internet and globalisation, there has been a constantly growing need for machine translation. For example, think of the European Union, with its 24 official languages into which each official document must be translated. The translation of official doc...

متن کامل

Statistical Machine Translation with Factored Translation Model: MWEs, Separation of Affixes, and Others

This paper discusses Statistical Machine Translation when the target side is morphologically richer language. This paper intends to discuss the issues which are not covered by a factored translation model of Moses especially targetting EN–JP translation: the effect of MultiWord Expressions, the separation of affixes, and other monolingual morphological issues. We intend to discuss these over a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008